1 Overview


FungiExpresZ is a browser based user interface (develoed in R-shiny) to analyse and visualize gene expression data. It allows users to visualize their own gene expression data as well as more than 13,000 pre processed SRA fungal gene expression data. A user can even merge their data with SRA data to perform combined analysis and visualizations. Just uploading gene expression matrix (.txt file where rows are genes and column are samples), a user can generate 12 different exploratory visualizations and 6 different GO visualizations. Optionally, a user can upload multiple gene groups and sample groups to compare between them. A user can select set of genes directly from one of the scatter plot, line plot or heatmap and pass them for GO analysis and GO visualizations. GO analysis and GO visualizations can be done for more than 100 different fungal species, which have been implemented through popular R package ClusterProfiler (Yu et al. 2012).

2 Key features


2.1 More than 13,000 NCBI-SRA data from 8 different fungal species

FungiExpresZ provides normalized gene expression values (FPKM) for more than 13,000 SRA samples. A user can select one or more SRA samples for visualizations. SRA data can be searched based on species, genotype, strain or free text, which will be matched against several SRA columns to find relavent hits.

species #sra_samples
Aspergillus nidulans FGSC A4 151
Candida albicans SC5314 639
Saccharomyces cerevisiae 11872
Aspergillus fumigatus Af293 242
Aspergillus niger CBS 513.88 253
Candida glabrata CBS 138 126
Talaromyces marneffei ATCC 18224 26
Candida auris B8 441 46

NOTE: We are continuously processing fungal SRA data. This table will be updated as we add new data.

2.2 Visualize user supplied gene expression data with or without integration of SRA data

Users can analyze and visualize their own data by uploading .txt/.csv file (columns are samples and rows are genes). Optionally, a user data can be integrated with selected SRA data for combined analysis and visualization.

2.3 Visualize multiple gene groups and sample groups in a single plot

Optionally, user can upload sample groups (e.g. replicates, control vs treatment, wild type vs deletion etc.) and multiple gene groups to compare between them. Group information uploaded once, can be used across several plots against fill and facet plot attributes to make more complex visualizations.

2.4 Twelve different data exploratory visualizations

FungiExpresZ provides browser based user friendly interface, which allows users to generate ggplot2 based 12 different publication-ready elegant visualizations. Users are allowed to adjust several common plot attributes such as plot title, axis title, font size, plot theme, legend size, legend position etc. and few other plot specific attributes. Currently, available plots are …

  1. Scatter Plot
  2. Multi-Scatter Plot
  3. Corr Heat Box
  4. Density Plot
  5. Histogram
  6. Joy Plot
  7. Box Plot
  8. Violin Plot
  9. Bar Plot
  10. PCA Plot
  11. Line Plot
  12. Heatmap

2.5 Supports Gene Ontology (GO) enrichment and visualizations for more than 100 different fungal species

FungiExpresZ allow users to define gene-set(s) directly from plot (Scatter plot, Line plot and Heatmap) to perform gene ontology enrichment and visualizations. Available GO visualizations are …

  1. Emap plot
  2. Cnet plot
  3. Dot plot
  4. Bar plot
  5. Heat plot
  6. Upset plot

3 Installation


FungiExpresZ can also be installed locally as an R package or docker image. Please follow the instructions given on github to install on local computer.

4 Example data


We have used cartoon gene expression data to generate plots given in this document.

4.1 Expression matrix

Expression matrix can be downloaded from the file given here [Download]. It contains 4 samples each with 3 replicates. Column names and their description have been given in the table below.

Column names Description
gene_id Gene id, unique to each row
Control_Rep.A Normalised FPKM values
Control_Rep.B Normalised FPKM values
Control_Rep.C Normalised FPKM values
Treat1_Rep.A Normalised FPKM values
Treat1_Rep.B Normalised FPKM values
Treat1_Rep.C Normalised FPKM values
Treat2_Rep.A Normalised FPKM values
Treat2_Rep.B Normalised FPKM values
Treat2_Rep.C Normalised FPKM values
Treat3_Rep.A Normalised FPKM values
Treat3_Rep.B Normalised FPKM values
Treat3_Rep.C Normalised FPKM values
Control_Mean Mean FPKM of control replicates A,B,C
Treat1_Mean Mean FPKM of Treatment1 replicates A,B,C
Treat1_Mean Mean FPKM of Treatment2 replicates A,B,C
Treat1_Mean Mean FPKM of Treatment3 replicates A,B,C
fc_treat1 log2FC(Treat1_Mean/Control1_Mean)
fc_treat2 log2FC(Treat2_Mean/Control2_Mean)
fc_treat3 log2FC(Treat3_Mean/Control3_Mean)

4.2 Sample groups

Sample group file contains two columns.

Columns Description
group_name User given name to each sample (column) group. Values in this column can be redundant.
group_members Values in this column must be from column names given as sample identity in the expression matrix file. Each value must be unique in this column.

In here, we have grouped samples by replicates. File can be downloaded from here [Download].

4.3 Gene groups

Gene group file contains two columns.

Columns Description
group_name User given name to each gene (row) group. Values in this column can be redundant.
group_members Values in this column must be from the first column given as row identity in the expression matrix file. Each value must be unique in this column.

In here, we have grouped genes by …

  1. Fold change comparison - Treatment(1,2 or 3)/Control.
  • Three groups in each comparison
    • UP
    • DOWN
    • NC
  1. Fold change status in two different comparisons.
  • Nine groups in each catagory
    • UP_UP
    • UP_DOWN
    • UP_NC
    • DOWN_DOWN
    • DOWN_UP
    • DOWN_NC
    • NC_NC
    • NC_UP
    • NC_DOWN

Gene groups files can be downloaded from the links given in the table below.

Gene group files Description
gene group file 1 [Download] gene groups by fold change Treat1/control
gene group file 2 [Download] gene groups by fold change Treat2/control
gene group file 3 [Download] gene groups by fold change Treat3/control
gene group file 4 [Download] gene groups by fold status Treat1/control vs Treat2/control
gene group file 5 [Download] gene groups by fold status Treat2/control vs Treat3/control
gene group file 6 [Download] gene groups by fold status Treat1/control vs Treat3/control

5 Exploratory example plots

By uploading data files (given above) to the FungiExpresZ, plots below can be generated.

5.1 Scatter plot

Scatter plot can be used to display pairwise correlation between 2 samples. User can color dots either by density (default) or gene groups.

Scatter plot: dots color by density (left) and color by gene groups (right)Scatter plot: dots color by density (left) and color by gene groups (right)

Scatter plot: dots color by density (left) and color by gene groups (right)

5.2 Multi-Scatter plot

Multi-scatter plot can be used to display pairwise correlation between more than 2 samples (Recommanded to show correlation between replicate samples). The lower half of the plot represents scatter plot while upper half represents correlation values. Plot diagonal displays distribution of each sample in form of density plot. As the sample number increases, total number of plots increasese exponentially in a single graphical device, which makes image crowdy and less interpretable. Therefore, we restrict user to include maximum 5 samples in one multi-scatter plot. Correlation heat-box is an alternative to show correlation in form of heat map for more than 5 samples.

Multi scatter plot: pairwise correlation between replicate pairs

Multi scatter plot: pairwise correlation between replicate pairs

5.3 CorrHeatBox

CorrHeatBox is useful to display pairwise correlation in form of heatmap.

Correlation heatbox: represented as sqare (left) and circle (right)Correlation heatbox: represented as sqare (left) and circle (right)

Correlation heatbox: represented as sqare (left) and circle (right)

Correlation heatbox: represented as upper half (left) and lower half (right)Correlation heatbox: represented as upper half (left) and lower half (right)

Correlation heatbox: represented as upper half (left) and lower half (right)

5.4 Density plot

Density plot can be used to display distribution of individual sample, sample groups or gene groups.

Density plot: distribution of **single sample single gene group** (left) and **multiple samples single gene group** (right)Density plot: distribution of **single sample single gene group** (left) and **multiple samples single gene group** (right)

Density plot: distribution of single sample single gene group (left) and multiple samples single gene group (right)

Density plot: distribution of **single sample  multiple gene groups** (left) and **multiple samples multiple gene groups** (right).Density plot: distribution of **single sample  multiple gene groups** (left) and **multiple samples multiple gene groups** (right).

Density plot: distribution of single sample multiple gene groups (left) and multiple samples multiple gene groups (right).

5.5 Histogram

Histogram can be used to display frequency count of individual sample, sample groups or gene groups.

Histogram: frequency of **single sample  single gene group** (left) and **multiple samples single gene group** (right)Histogram: frequency of **single sample  single gene group** (left) and **multiple samples single gene group** (right)

Histogram: frequency of single sample single gene group (left) and multiple samples single gene group (right)

Histogram: frequency of **single sample  multiple gene groups** (left) and **multiple samples  multiple gene groups** (right)Histogram: frequency of **single sample  multiple gene groups** (left) and **multiple samples  multiple gene groups** (right)

Histogram: frequency of single sample multiple gene groups (left) and multiple samples multiple gene groups (right)

5.6 Joy plot

Joy plot can be used to display distribution of individual sample, sample groups or gene groups. By separating muliple variables on Y axis, it overcome the limitation of normal density plot.

Joy plot: **multiple samples single gene group** color by probability (left) and color by quantile (right) Joy plot: **multiple samples single gene group** color by probability (left) and color by quantile (right)

Joy plot: multiple samples single gene group color by probability (left) and color by quantile (right)

Joy plot: **multiple sample groups single gene group** (left) and **multiple samples multiple gene groups** (right)Joy plot: **multiple sample groups single gene group** (left) and **multiple samples multiple gene groups** (right)

Joy plot: multiple sample groups single gene group (left) and multiple samples multiple gene groups (right)

5.7 Box plot

Boxplot can be used to display distribution of each observation and quantiles from individual sample, sample groups or gene groups.

Box plot: **multiple samples** colored by samples (left) and colored by sample groups (right)Box plot: **multiple samples** colored by samples (left) and colored by sample groups (right)

Box plot: multiple samples colored by samples (left) and colored by sample groups (right)

Box plot: **multiple samples multiple sample groups** (left) and **multiple samples multiple gene groups** (right) Box plot: **multiple samples multiple sample groups** (left) and **multiple samples multiple gene groups** (right)

Box plot: multiple samples multiple sample groups (left) and multiple samples multiple gene groups (right)

5.8 Violin plot

Similar to box plot, violin plot, can be used to display distribution of each observation and quantiles from individual sample, sample groups or gene groups.

Violin plot: **multiple samples** colored by samples (left) and colored by sample groups (right)Violin plot: **multiple samples** colored by samples (left) and colored by sample groups (right)

Violin plot: multiple samples colored by samples (left) and colored by sample groups (right)

Violin plot:  **multiple samples multiple sample groups** (left) and **multiple samples multiple gene groups** (right)Violin plot:  **multiple samples multiple sample groups** (left) and **multiple samples multiple gene groups** (right)

Violin plot: multiple samples multiple sample groups (left) and multiple samples multiple gene groups (right)

5.9 Bar plot

Bar plot can be used to display expression of individual genes across multiple samples, sample groups and gene groups.

Bar plot:  **expression of individual genes across samples** colors by genes (left) and colors by genes and faceted by sample groups (right)Bar plot:  **expression of individual genes across samples** colors by genes (left) and colors by genes and faceted by sample groups (right)

Bar plot: expression of individual genes across samples colors by genes (left) and colors by genes and faceted by sample groups (right)

5.10 PCA plot

PCA plot can be used to display similarity and differences between samples and sample groups using principle components

PCA plot :  color by sample groups (left) and color by k-means (right)PCA plot :  color by sample groups (left) and color by k-means (right)

PCA plot : color by sample groups (left) and color by k-means (right)

5.11 Line plot

Line plot can be used to display genes’ trend across multiple samples. User can group observations either by k-means or pre defined gene groups.

Line plot :  k-means clusters individual gene (left) and  cluster mean (right)Line plot :  k-means clusters individual gene (left) and  cluster mean (right)

Line plot : k-means clusters individual gene (left) and cluster mean (right)

5.12 Heatmap

Heatmap can be used to display genes’ trend across multiple samples. User can group genes and samples either by k-means or pre defined gene groups or sample groups.

Heatmap: row clusters by k-means (left) and row clusters by gene groups (right)Heatmap: row clusters by k-means (left) and row clusters by gene groups (right)

Heatmap: row clusters by k-means (left) and row clusters by gene groups (right)

Heatmap: along with column box plot on top (left) and parallel row standard deviation heatmap (right)Heatmap: along with column box plot on top (left) and parallel row standard deviation heatmap (right)

Heatmap: along with column box plot on top (left) and parallel row standard deviation heatmap (right)

Heatmap: row clusters sorted by standard deviation (left) and columns clustered by sample groups (right) Heatmap: row clusters sorted by standard deviation (left) and columns clustered by sample groups (right)

Heatmap: row clusters sorted by standard deviation (left) and columns clustered by sample groups (right)

6 GO example plots

User can select genes or gene clusters from one of the scatter plot, lineplot or heatmap and pass them to GO enrichment followed by GO visualizations. If data uploaded by user, geneIds (first column of the file) must match with the geneIds of the selected species.

6.1 GO dotplot

GO dotplot

GO dotplot

6.2 GO barplot

GO barplot

GO barplot

6.3 GO heatplot

GO heatplot

GO heatplot

6.4 GO emapplot

GO emapplot

GO emapplot

6.5 GO cnetplot

GO cnetplot

GO cnetplot

6.6 GO upsetplot

GO upsetplot

GO upsetplot

References

Yu, Guangchuang, Li-Gen Wang, Yanyan Han, and Qing-Yu He. 2012. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters.” OMICS: A Journal of Integrative Biology. https://doi.org/10.1089/omi.2011.0118.